The aim of this project….
This project used data from 1500 residential property sales in Ames, Iowa between 2006 and 2012. There are 82 explanatory variables in the data set, containing - nominal, ordinal, discrete, and continuous attributes. Continuous variables provide information about the multiple area dimensions of the house and property, such as the the size of the lot, garage among others. Discrete variables, on the other hand, quantify characteristics of the house/properties like the number of kitchens, baths, bedrooms, and parking spots. Nominal variables, generally, describe the multiple types of materials and locations, such name of the neighborhood or the type of foundations. Ordinal variables typically rate the condition and quality of multiple house characteristics and utilities.
We decided to keep this as a continuous variable as opposed to switching it to a factor. We did so because changing it to a factor would have lead to us dropping the “Very Poor” or “1” factor level as this level only has around 4 observations. By keeping the variable continuous, we are able to keep these observations and so better predict the home prices of homes that fall under this category.
Sale Price graph
When it comes to lot area, this dataset has many outliers as shown above. We found that there were 127 outliers greater than the minimum outlier value of 1300. As these made visualization difficult, we temporarily removed them. After removing the outliers, we can see that homes have a somewhat normal distribution in terms of lot area near the median of 9453 square feet.
From Figure 3, we see that 1-story homes that were built in 1946 or later make up the bulk of our dataset, specifically 1079. This is over one-third of our total dataset which has 2911 observations. Please not that the graphs are interactive so move your cursor over the graph to see more details. Furthermore, we can also observe from Figure 4, that most homes were built within a 5 year time range of 2005.
Summary Statistics
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 2.000 3.000 3.000 3.511 4.000 5.000
Some intro text
Neighborhood We can observe a large variation in sale price across across different neighborhoods. Even within neighborhood we also see variation. Investigating some housing characteristics may give us insight into the variation observed in price within neighborhoods.
In terms of overall quality, as expected price increases as overall quality increases.
As one would expect, the newer a home is, the higher its price, on average.
When looking at home type by sale price, we find that 2 story homes built in the year 1946 or later have the highest median home prices.
There was only one house that had a rating of for kitchen quality, thus this observation was removed from the data. We see a non-linear relationship between kitchen quality and sale price. The higher the kitchen quality the higher the median sale price. From Figure 10, we can see that - as expected - there is a gradual positive relationship between lot area and sales price.